AITopics | approximate and online reinforcement learning

Collaborating Authors

approximate and online reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Neural Information Processing SystemsNov-20-2025, 22:06:46 GMT

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work (Efroni et al., 2018), multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.

approximate and online reinforcement learning, multiple-step greedy policy, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)

Add feedback

Reviews: Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Neural Information Processing SystemsOct-7-2024, 08:52:47 GMT

The authors first show a negative result that soft-policy updates using the multi-step greedy policies do not guarantee policy improvement. Then the authors proposed an algorithm that uses cautious soft updates (only update to the kappa greedy policy only when assured to improve, otherwise stay with one-step greedy policy) and show that it converges to the optimal policy. Lastly the authors studied hard updates by extending APIs to multi-step greedy policy setting. Comments: 1. Theorem 2 presents an interesting and surprising result. Though the authors presented the example in the proof sketch, but I wonder if the authors could provide more intuitions behind this? Based on the theorem, for multi-step greedy policy, it seems that h needs to be bigger than 2. So I suspect that h 2 will still work (meaning there could exist small alpha)? Obviously h 1 works, but then why when h 3, the soft-update suddenly stops working unless alpha is exactly equal to 1? I would expect that one would require larger alpha when h gets larger.

approximate and online reinforcement learning, multiple-step greedy policy, oracle, (10 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Efroni, Yonathan, Dalal, Gal, Scherrer, Bruno, Mannor, Shie

Neural Information Processing SystemsFeb-14-2020, 16:27:16 GMT

algorithm, approximate and online reinforcement learning, multiple-step greedy policy

Neural Information Processing Systems

Genre: Instructional Material > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback